Gene Onoltogy Analysis on day50 Heterozygous

1. Environment Set Up

for (i in 1:length(params))
  print(paste('Parameter:', names(params)[i], ' - Value:', params[[i]], '- Class:', class(params[[i]])))
## [1] "Parameter: Dataset  - Value: CHD2_iPSCs_and_organoids_PublicRepo - Class: character"
## [1] "Parameter: SEFile  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day50/Output/Savings/day50CbO.SE_deseq2_HT.rds - Class: character"
## [1] "Parameter: DEAList  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day50/Output/Savings/day50CbO.DEAList_HT.rds - Class: character"
## [1] "Parameter: HT  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day50/Output/Savings/day50CbO.deseqHTvsWT.rds - Class: character"
## [1] "Parameter: SavingFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day50/Output/Savings/ - Class: character"
## [1] "Parameter: FiguresFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day50/Output/Figures/ - Class: character"
## [1] "Parameter: FDRthr  - Value: 0.05 - Class: numeric"
## [1] "Parameter: logFCthr  - Value: 0.55 - Class: numeric"
## [1] "Parameter: TopGO  - Value: BP_MF_CC - Class: character"
## [1] "Parameter: GoEnTh  - Value: 1 - Class: numeric"
## [1] "Parameter: GoPvalTh  - Value: 0.05 - Class: numeric"
## [1] "Parameter: NbName  - Value: TopGO_day50_HT - Class: character"
## [1] "Parameter: SaveImages  - Value: FALSE - Class: logical"
  • Dataset: name of the dataset that is processed.
  • SE: input file containing differential expression results from DESeq2 (absolute path).
  • FDRthr: threshold on False Discovery Rate. Default 0.05.
  • logFCthr: threshold on False Discovery Rate. Default 1.5.
  • SavingFolder: directory where produced files will be written (absolute path). Default is getwd().
  • TopGO: string that specify the ontology domains to be analysed. Default BP and MF; also CC can be added.
library(RNASeqBulkExploratory)
library(SummarizedExperiment)
library(tidyr)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(topGO)
library(sechm)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
library(data.table)
source('/group/testa/Users/oliviero.leonardi/myProjects/CHD2/BulkRNAseq/ContainerHome/CHD2_organoids/NoGradientBarplots.R')
Dataset <- params$Dataset
logFCthr <- params$logFCthr
FDRthr <- params$FDRthr

FdrTh <- FDRthr
logFcTh <- logFCthr

SavingFolder <- ifelse(is.null(params$SavingFolder), getwd(), params$SavingFolder)
FiguresFolder <- ifelse(is.null(params$FiguresFolder), getwd(), params$FiguresFolder)


if (dir.exists(SavingFolder) == FALSE) {
  dir.create(SavingFolder, recursive=TRUE)
}

2. Data Upload

  • Summarized Experiment object containing expression data used for DEA and gene and sample metadata
  • DEA object, containing results of the differential expression

2.1 Load Data from DEA

#SE object coming from DEA, but not containing specific contrast results
SE_DEA <- readRDS(params$SEFile)

SE_DEA <- SE_DEA[rowData(SE_DEA)$GeneName != '', ]
rownames(SE_DEA) <- rowData(SE_DEA)$GeneName

# List with differential expression results (all time-points)
DEA <- readRDS(params$DEAList)
colvector <- c("#5ec962", "#e95462", "#2c728e")
names(colvector) <- c('All', 'Up',  'Down')

2.2 Add DEA results to SE

if(! identical(rownames(SE_DEA), row.names(DEA$HT$res))){
  stop('Expression data in SE and results from differential espression analysis are inconsistent.')
}

SE_DEA <- mergeDeaSE(SE_DEA, DEA$HT$res, subsetRowDataCols=NULL,
                     logFcCol='log2FoldChange', FdrCol='padj') #specify
## Renaming " log2FoldChange " to "logFC"
## Renaming "  padj  " to "FDR"

17673 genes in 14 samples have been testes for differential expression.

The following number of genes are identified as differentially expressed:

  • FDR < 0.1: 6638 differentially expressed genes
  • FDR < 0.05: 5601 differentially expressed genes
  • FDR < 0.05 and FC > 1.5: 2202 differentially expressed genes
  • FDR < 0.05 and FC > 2: 1136 differentially expressed genes

Imposing a threshold of 0.55 on the Log2FC and 0.05 on the FDR (as specified in parameters), 5601 genes are selected: 4905 up-regulated genes and 4894 down-regulated genes.


3. RESULTS NAVIGATION: Interactive Table

An interactive table show the results for all DEGs (ranked according to FDR). A table of all DEGs can be downloaded from here.

From topTag I generate an interactive table for result interrogation with link to the Gene Cards. The table reports all the genes having a FDR < 0.05 and a Log2FC > 0.55 as absolute value, according to the threshold settings.

DEGsTable(SE_DEA, FdrTh=FdrTh, logFcTh=logFCthr, maxGenes=Inf, saveDEGs=FALSE)

4. RESULTS VISUALIZATION

4.1 Volcano plot

The results of the differential expression analysis are visualized by Volcano plot. An interactive version is included in the html (only genes with FDR < threshold), while a static version is saved.

plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = FALSE)
## Warning: Removed 343 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 17531 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: ggrepel: 100 unlabeled data points (too many overlaps). Consider increasing max.overlaps


plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = TRUE)

4.2 Heatmap for significant genes

Heatmaps for DEGs, showing scaled vst values.

DEGs <- dplyr::filter(data.frame(rowData(SE_DEA)), FDR < FDRthr & abs(logFC) > log2(logFCthr))
ScaledCols <- c('darkblue', "purple","white","lightgoldenrod1", 'goldenrod1')
# sechm::sechm(SE_DEA, genes=DEGs$GeneName, assayName="vst", gaps_at="Genotype", show_rownames=FALSE,
#       top_annotation=c('Genotype'), hmcols=ScaledCols, show_colnames=TRUE,
#       do.scale=TRUE, breaks=0.85, column_title = "Scaled Vst Values")

5. TOPGO for Gene Ontology Enrichment analysis

Gene ontology enrichment analysis is performed on the set of 5601 genes using TopGO with Fisher statistics and weight01 algorithm.

For each specified domain of the ontology:

  • Enrichment analysis on all DEGs or splitted in down- and up-regulated

5.1 Selection of modulated genes and generation of gene vectors

I generate vectors for the gene universe, all modulated genes, up-regulated genes and down-regulated genes in the format required by TopGo.

GeneVectors <- topGOGeneVectors(SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr)
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1

Therefore:

  • universe genes: 17673 genes
  • modulated genes: 2350 genes
  • down-regulated genes: 1151 genes of interest
  • up-regulated genes: 1199 genes of interest

Then I set parameters according to the gene ontology domains to be evaluated. By default, Biological Process and Molecular Function domains are interrogated.

BpEval <- ifelse(length(grep('BP', params$TopGO))!=0, TRUE, FALSE)
MfEval <- ifelse(length(grep('MF', params$TopGO))!=0, TRUE, FALSE)
CcEval <- ifelse(length(grep('CC', params$TopGO))!=0, TRUE, FALSE)

5.2 TopGO analysis: Biological Process

On the basis of the analysis settings, the enrichment for Biological Process IS performed.

Biological Process Analysis for ALL modulated genes: 2350 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
BPannHT <- topGO::annFUN.org(whichOnto="BP", feasibleGenes=names(GeneVectors$DEGenes), 
                    mapping="org.Hs.eg.db", ID="symbol") %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResBPAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=BPannHT, ontology='BP', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='BPAllHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11741 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15403 GO terms and 35047 relations. )
## 
## Annotating nodes ...............
##  ( 13971 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 7512 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  8 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  12 nodes to be scored   (31 eliminated genes)
## 
##   Level 16:  29 nodes to be scored   (54 eliminated genes)
## 
##   Level 15:  69 nodes to be scored   (143 eliminated genes)
## 
##   Level 14:  122 nodes to be scored  (357 eliminated genes)
## 
##   Level 13:  213 nodes to be scored  (772 eliminated genes)
## 
##   Level 12:  370 nodes to be scored  (1767 eliminated genes)
## 
##   Level 11:  663 nodes to be scored  (3882 eliminated genes)
## 
##   Level 10:  953 nodes to be scored  (6056 eliminated genes)
## 
##   Level 9:   1156 nodes to be scored (7727 eliminated genes)
## 
##   Level 8:   1143 nodes to be scored (9641 eliminated genes)
## 
##   Level 7:   1041 nodes to be scored (11150 eliminated genes)
## 
##   Level 6:   825 nodes to be scored  (12270 eliminated genes)
## 
##   Level 5:   505 nodes to be scored  (13030 eliminated genes)
## 
##   Level 4:   266 nodes to be scored  (13499 eliminated genes)
## 
##   Level 3:   111 nodes to be scored  (13667 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13759 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13808 eliminated genes)

Biological Process Analysis for DOWN-REGULATED genes: 1151 genes

# Wrapper function for topGO analysis (see helper file)
ResBPDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=BPannHT, ontology='BP', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='BPDownHT', outDir=paste0(SavingFolder)) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11741 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15403 GO terms and 35047 relations. )
## 
## Annotating nodes ...............
##  ( 13971 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 6238 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  9 nodes to be scored    (26 eliminated genes)
## 
##   Level 16:  14 nodes to be scored   (49 eliminated genes)
## 
##   Level 15:  45 nodes to be scored   (120 eliminated genes)
## 
##   Level 14:  94 nodes to be scored   (228 eliminated genes)
## 
##   Level 13:  158 nodes to be scored  (651 eliminated genes)
## 
##   Level 12:  288 nodes to be scored  (1586 eliminated genes)
## 
##   Level 11:  511 nodes to be scored  (3689 eliminated genes)
## 
##   Level 10:  757 nodes to be scored  (5865 eliminated genes)
## 
##   Level 9:   951 nodes to be scored  (7463 eliminated genes)
## 
##   Level 8:   941 nodes to be scored  (9375 eliminated genes)
## 
##   Level 7:   898 nodes to be scored  (11035 eliminated genes)
## 
##   Level 6:   743 nodes to be scored  (12201 eliminated genes)
## 
##   Level 5:   464 nodes to be scored  (12990 eliminated genes)
## 
##   Level 4:   232 nodes to be scored  (13493 eliminated genes)
## 
##   Level 3:   103 nodes to be scored  (13666 eliminated genes)
## 
##   Level 2:   21 nodes to be scored   (13757 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13808 eliminated genes)

# Selection on enrichment of at least 2 is implemented (also to avoid depleted categories). Then categories are ranked by PVal and all the ones with Pval < th are selected. If the number is < minTerms, othter terms are included to reach the minimum number. 
GOTable(ResBPDownHT$ResSel, maxGO=20)

Biological Process Analysis for UP-REGULATED genes: 1199 genes

ResBPUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=BPannHT, ontology='BP', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='BPUpHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11741 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15403 GO terms and 35047 relations. )
## 
## Annotating nodes ...............
##  ( 13971 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 6162 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  9 nodes to be scored    (24 eliminated genes)
## 
##   Level 16:  23 nodes to be scored   (41 eliminated genes)
## 
##   Level 15:  53 nodes to be scored   (135 eliminated genes)
## 
##   Level 14:  93 nodes to be scored   (318 eliminated genes)
## 
##   Level 13:  164 nodes to be scored  (673 eliminated genes)
## 
##   Level 12:  278 nodes to be scored  (1549 eliminated genes)
## 
##   Level 11:  493 nodes to be scored  (3657 eliminated genes)
## 
##   Level 10:  721 nodes to be scored  (5675 eliminated genes)
## 
##   Level 9:   914 nodes to be scored  (7218 eliminated genes)
## 
##   Level 8:   935 nodes to be scored  (9260 eliminated genes)
## 
##   Level 7:   900 nodes to be scored  (10921 eliminated genes)
## 
##   Level 6:   730 nodes to be scored  (12164 eliminated genes)
## 
##   Level 5:   468 nodes to be scored  (12977 eliminated genes)
## 
##   Level 4:   248 nodes to be scored  (13493 eliminated genes)
## 
##   Level 3:   105 nodes to be scored  (13667 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13759 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13807 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/BPUp'), recursive=TRUE)
#GOAnnotation(ResBPUp$ResSel, GOdata=ResBPUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/BPUp'), keytype='SYMBOL')
GOTable(ResBPUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResBPAllHT$ResSel, TopGOResDown=ResBPDownHT$ResSel, TopGOResUp = ResBPUpHT$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResBPAllHT$ResSel, TopGOResDown = ResBPDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.3 TopGO analysis: Molecular Function

On the basis of the analysis settings, the enrichment for Molecular Function IS performed.

Molecular Function Enrichment for ALL modulated genes: 2350 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
MFannHT <- topGO::annFUN.org(whichOnto='MF', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResMFAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=MFannHT, ontology='MF', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='MFAllHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4629 GO terms and 5968 relations. )
## 
## Annotating nodes ...............
##  ( 14383 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1454 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  19 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  31 nodes to be scored   (48 eliminated genes)
## 
##   Level 9:   75 nodes to be scored   (247 eliminated genes)
## 
##   Level 8:   143 nodes to be scored  (1458 eliminated genes)
## 
##   Level 7:   242 nodes to be scored  (3639 eliminated genes)
## 
##   Level 6:   300 nodes to be scored  (4571 eliminated genes)
## 
##   Level 5:   318 nodes to be scored  (6463 eliminated genes)
## 
##   Level 4:   232 nodes to be scored  (9395 eliminated genes)
## 
##   Level 3:   69 nodes to be scored   (11655 eliminated genes)
## 
##   Level 2:   17 nodes to be scored   (12371 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14249 eliminated genes)

Molecular Function Enrichment for DOWN-REGULATED genes: 1151 genes

ResMFDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=MFannHT, ontology='MF', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='MFDownHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4629 GO terms and 5968 relations. )
## 
## Annotating nodes ...............
##  ( 14383 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1184 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  19 nodes to be scored   (29 eliminated genes)
## 
##   Level 9:   61 nodes to be scored   (200 eliminated genes)
## 
##   Level 8:   112 nodes to be scored  (1351 eliminated genes)
## 
##   Level 7:   189 nodes to be scored  (3574 eliminated genes)
## 
##   Level 6:   234 nodes to be scored  (4419 eliminated genes)
## 
##   Level 5:   264 nodes to be scored  (6210 eliminated genes)
## 
##   Level 4:   208 nodes to be scored  (9123 eliminated genes)
## 
##   Level 3:   63 nodes to be scored   (11536 eliminated genes)
## 
##   Level 2:   17 nodes to be scored   (12355 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14246 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFDown'), recursive=TRUE)
#GOAnnotation(ResMFDown$ResSel, GOdata=ResMFDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFDown'), keytype='SYMBOL')
GOTable(ResMFDownHT$ResSel, maxGO=20)

Molecular Function Analysis for UP-REGULATED genes: 1199 genes

ResMFUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=MFannHT, ontology='MF', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='MFUpHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4629 GO terms and 5968 relations. )
## 
## Annotating nodes ...............
##  ( 14383 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1130 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  17 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  24 nodes to be scored   (29 eliminated genes)
## 
##   Level 9:   52 nodes to be scored   (236 eliminated genes)
## 
##   Level 8:   106 nodes to be scored  (1388 eliminated genes)
## 
##   Level 7:   171 nodes to be scored  (3537 eliminated genes)
## 
##   Level 6:   227 nodes to be scored  (4397 eliminated genes)
## 
##   Level 5:   252 nodes to be scored  (6115 eliminated genes)
## 
##   Level 4:   198 nodes to be scored  (9100 eliminated genes)
## 
##   Level 3:   61 nodes to be scored   (11574 eliminated genes)
## 
##   Level 2:   16 nodes to be scored   (12342 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14231 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFUp'), recursive=TRUE)
#GOAnnotation(ResMFUp$ResSel, GOdata=ResMFUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFUp'), keytype='SYMBOL')
GOTable(ResMFUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResMFAllHT$ResSel, TopGOResDown=ResMFDownHT$ResSel, TopGOResUp=ResMFUpHT$ResSel, 
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResMFAllHT$ResSel, TopGOResDown = ResMFDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.4 TopGO analysis: Cellular Component

On the basis of the analysis settings, the enrichment for Cellular Component IS performed.

Cellular Component Enrichment for ALL modulated genes: 2350 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
CCannHT <- topGO::annFUN.org(whichOnto='CC', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResCCAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=CCannHT, ontology='CC', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='CCAllHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1698 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1919 GO terms and 3235 relations. )
## 
## Annotating nodes ...............
##  ( 14645 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 818 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  35 nodes to be scored   (47 eliminated genes)
## 
##   Level 10:  86 nodes to be scored   (77 eliminated genes)
## 
##   Level 9:   125 nodes to be scored  (822 eliminated genes)
## 
##   Level 8:   130 nodes to be scored  (2908 eliminated genes)
## 
##   Level 7:   129 nodes to be scored  (5313 eliminated genes)
## 
##   Level 6:   113 nodes to be scored  (8895 eliminated genes)
## 
##   Level 5:   81 nodes to be scored   (10622 eliminated genes)
## 
##   Level 4:   53 nodes to be scored   (12694 eliminated genes)
## 
##   Level 3:   55 nodes to be scored   (13996 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14423 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14570 eliminated genes)

#write.table(ResCCAll$ResAll, file=paste0(SavingFolder, 'TopGO/CCAllResults.txt'), sep='\t', row.names=FALSE)

Cellular Component Enrichment for DOWN-REGULATED genes: 1151 genes

# Wrapper function for topGO analysis (see helper file)
ResCCDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=CCannHT, ontology='CC', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='CCDownHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1698 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1919 GO terms and 3235 relations. )
## 
## Annotating nodes ...............
##  ( 14645 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 676 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  26 nodes to be scored   (35 eliminated genes)
## 
##   Level 10:  70 nodes to be scored   (65 eliminated genes)
## 
##   Level 9:   107 nodes to be scored  (761 eliminated genes)
## 
##   Level 8:   110 nodes to be scored  (2822 eliminated genes)
## 
##   Level 7:   103 nodes to be scored  (5220 eliminated genes)
## 
##   Level 6:   96 nodes to be scored   (8853 eliminated genes)
## 
##   Level 5:   66 nodes to be scored   (10575 eliminated genes)
## 
##   Level 4:   42 nodes to be scored   (12676 eliminated genes)
## 
##   Level 3:   49 nodes to be scored   (13994 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14422 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14570 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCDown'), recursive=TRUE)
#GOAnnotation(ResCCDown$ResSel, GOdata=ResCCDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCDown'), keytype='SYMBOL')
GOTable(ResCCDownHT$ResSel, maxGO=20)

Cellular Component Analysis for UP-REGULATED genes: 1199 genes

# Wrapper function for topGO analysis (see helper file)
ResCCUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=CCannHT, ontology='CC', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='CCUpHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1698 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1919 GO terms and 3235 relations. )
## 
## Annotating nodes ...............
##  ( 14645 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 664 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  26 nodes to be scored   (47 eliminated genes)
## 
##   Level 10:  66 nodes to be scored   (69 eliminated genes)
## 
##   Level 9:   92 nodes to be scored   (761 eliminated genes)
## 
##   Level 8:   100 nodes to be scored  (2623 eliminated genes)
## 
##   Level 7:   105 nodes to be scored  (5044 eliminated genes)
## 
##   Level 6:   95 nodes to be scored   (8743 eliminated genes)
## 
##   Level 5:   76 nodes to be scored   (10591 eliminated genes)
## 
##   Level 4:   49 nodes to be scored   (12677 eliminated genes)
## 
##   Level 3:   45 nodes to be scored   (13996 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14422 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14570 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCUp'), recursive=TRUE)
#GOAnnotation(ResCCUp$ResSel, GOdata=ResCCUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCUp'), keytype='SYMBOL')
GOTable(ResCCUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResCCAllHT$ResSel, TopGOResDown=ResCCDownHT$ResSel, TopGOResUp=ResCCUpHT$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResCCAllHT$ResSel, TopGOResDown = ResCCDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`


7. Savings

Most of the useful information has been saved during the analysis. Here I save figures, workspace and information about the session.

if (params$SaveImages == TRUE){   #Just in case since eval only works when knitting
  #Set the folder paths
  from <- paste(getwd(), paste(params$NbName, 'files/figure-html', sep='_'), sep='/')
  to <- params$FiguresFolder

  #Copy to output directory
  file.copy(from, to, recursive = TRUE, copy.mode = TRUE)
}
ResSelBP_HT <- list(ResSelAll = ResBPAllHT$ResSel,
                    ResSelUp = ResBPUpHT$ResSel,
                    ResSelDown = ResBPDownHT$ResSel)

ResSelMF_HT <- list(ResSelAll = ResMFAllHT$ResSel,
                    ResSelUp = ResMFUpHT$ResSel,
                    ResSelDown = ResMFDownHT$ResSel)

ResSelCC_HT <- list(ResSelAll = ResCCAllHT$ResSel,
                    ResSelUp = ResCCUpHT$ResSel,
                    ResSelDown = ResCCDownHT$ResSel)

ResSel_HT <- list(BP = ResSelBP_HT,
                  MF = ResSelMF_HT,
                  CC = ResSelCC_HT)

saveRDS(ResSel_HT, paste0(SavingFolder, '/day50CbO.', 'ResSel_HT.rds'))
SessionInfo <- sessionInfo()
Date <- date()

save.image(paste0(SavingFolder, '/day50CbO.', 'FunctionalAnalysisWorkspace_HT.RData'))

last update on: Fri Jan 10 21:02:13 2025